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Abstract. The performance of two pivoting algorithms, due to Lemke 
and Cottle and Dantzig, is studied on linear complementarity problems 
(LCPs) that arise from infinite games, such as parity, average-reward, 
and discounted games. The algorithms have not been previously studied 
in the context of infinite games, and they offer alternatives to the classi- 
cal strategy-improvement algorithms. The two algorithms are described 
purely in terms of discounted games, thus bypassing the reduction from 
the games to LCPs, and hence facilitating a better understanding of the 
algorithms when applied to games. A family of parity games is given, 
on which both algorithms run in exponential time, indicating that in 
the worst case they perform no better for parity, average-reward, or dis- 
counted games than they do for general P-matrix LCPs. 

1 Introduction 

In this paper we consider infinite-duration zero-sum games played on finite 
graphs, such as parity, average-reward, and discounted games. Parity games are 
important in the theory of algorithmic formal verification because they provide 
a combinatorial characterization of the meaning of nested inductive and co- 
inductive definitions, as formalized in the modal /i-calculus and other fixpoint 
logics [12]. In particular, deciding the winner in parity games is polynomial- 
time equivalent to checking non-emptiness of non-deterministic parity tree au- 
tomata, and to the modal /i-calculus model checking, two fundamental algorith- 
mic problems in automata theory, logic, and verification [7,18,12]. Discounted 
and average-reward games have been introduced by Shapley [17] and Gillette [11] 
in the 1950s, and they have been extensively studied in the game theory, math- 
ematical programming, algorithms, and AI communities [21,8]. Parity, average- 
reward, and discounted games have an intriguing complexity-theoretic status. 
The problems of deciding the winner in these games are some of the few known 
combinatorial problems in NP n co-NP (and even UP n co-UP [13]) that are 
not known to be solvable in polynomial time. 

The linear complementarity problem (LCP) is a fundamental problem in 
mathematical programming. It naturally captures equilibrium problems, as well 
as the complementary slackness and Karush-Kuhn-Tucker conditions of linear 
and quadratic programming, respectively. The monograph of Cottle et. al. [6] 
is the authoritative source on the LCP. In general, deciding if an LCP has a 



solution is NP-complete [3] . If, however, the matrix (which is a part of the LCP 
input) is a P-matrix (i.e., if all its principal minors are positive) then the problem 
is arguably easier computationally. Every P-matrix LCP (P-LCP) has a unique 
solution and computing it is in PLS n PPAD. A significant amount of effort has 
been invested by the mathematical programming community towards finding an 
efficient algorithm for the P-LCP, which has led to a wide body of literature 
in this area. Polynomial-time reductions from simple stochastic games [10, 19] 
and discounted games [14] to the P-LCP have been recently proposed, however 
the techniques commonly used to solve P-LCPs remain largely unknown in the 
infinite games community. It is possible that these techniques could shed new 
light on the computational complexity of solving infinite games. 

In this paper we consider two classical pivoting algorithms for the P-LCP, 
Lemke's algorithm and the Cottle-Dantzig principal pivoting algorithm, and we 
study their performance on P-LCPs obtained from discounted games by the 
reduction of Jurdzihski and Savani [14]. Our first main contribution is to describe 
both algorithms purely as a process that works on the original discounted game, 
bypassing the reduction from games to the P-LCP, and hence we facilitate their 
analysis without the need to consider or understand concepts of the LCP theory. 
We present the algorithms for discounted games because they have technical 
advantages that make the exposition particularly transparent [14]. We argue, 
however, that this is done without loss of generality: the algorithms can be 
readily applied to parity games and average-reward games because there are 
transparent polynomial-time reductions from parity games to average-reward 
games [16, 13], and from average-reward games to discounted games [21]. 

It has long been known that the two algorithms can take exponential time 
when applied to P-LCPs. However, it is not known whether these lower bounds 
hold for the LCPs that arise from infinite games. Our second main contribution 
is to prove that there is a family of discounted games on which the algorithms 
of Lcmke, and Cottle and Dantzig run in exponential time, and hence we indi- 
cate limitations of the classical LCP theory in the context of infinite games. Our 
family of examples are derived from those given by Bjorklund and Vorobyov [2] 
for their strategy improvement algorithm for average-reward games. For tech- 
nical convenience and without loss of generality, we present these families of 
hard examples as discounted games; it is easy to construct parity and average- 
reward games from which those discounted games are obtained via the standard 
reductions [16, 13, 21]. 

We stress that these lower bounds are not fatal. The lower bound for Lemke's 
algorithm requires a specific covering vector and the lower bound for the Cottle- 
Dantzig algorithm relies on a specific choice of ordering on the vertices. The 
covering vector and the ordering are free choices left up to the user of the al- 
gorithm. This situation can be compared to the classical strategy improvement 
algorithms for infinite games [5,20,2]. It has long been known that these al- 
gorithms can be made to run in exponential time by choosing sufficiently bad 
vertices to switch [15]. However, it has only recently been shown that reasonable 
switching policies can be made to run in exponential time [9]. The complexity of 



our algorithms when equipped with reasonable covering vectors and reasonable 
orderings remains open. The literature on P-LCPs contains exciting complexity 
results for special cases. For example Adler and Megiddo [1] studied the per- 
formance of Lcmkc's algorithm for the LCPs arising from linear programming 
problems. They showed that for randomly chosen linear programs and a care- 
fully selected covering vector, the expected number of pivots performed by the 
algorithm is quadratic. Our results open the door to extending such analyses to 
infinite games. 

2 Preliminaries 

A binary discounted game is given by a tuple G = (V, VMax, VMin, A, p, r x ,r p , (3), 
where V is a set of vertices and Vm&x and V^im partition V into the set of vertices 
of player Max and the set of vertices of player Min, respectively. Each vertex 
has exactly two outgoing edges which are given by the left and right successor 
functions A, p : V — > V. Each edge has a reward associated with it given by the 
functions r x ,r p : V — > R. Finally the discount factor (3 is such that < < 1. 

The game begins with a token on a starting vertex v$. In each round, the 
player who owns the vertex on which the token is placed chooses one of the 
two successors of that vertex and moves the token to that successor. In this 
fashion the two players form an infinite path tt = (vq, v±, t>2, • • • ) where Vi + i is 
equal to cither X(vi) or p{vi). The path yields the infinite sequence of rewards 
(ro, ri, T2, ■ ■ ■ ), where r% = r x (vi) if X(vi) = Uj+i, and r» = r p (vi) otherwise. The 
payoff of an infinite path is denoted by T>(tt) — ^2^ (3 z r l . Since the game is 
zero-sum, player Max wins 2?(7r) and player Min loses an equal amount. 

A positional strategy for player Max is a function that, for each vertex be- 
longing to player Max, chooses one of the two successors of the vertex. The 
strategy is denoted by \ '■ ^Max — ► V with the condition that, for every vertex v 
in Vmsx; the function x( v ) 1S equal to either X(v) or p(v). Positional strategies 
for player Min are defined analogously. The sets of pure positional strategies for 
Max and Min are denoted by ilMax and iJMin, respectively. Given a pair of posi- 
tional strategies, x and p for Max and Min respectively, and an initial vertex vq, 
there is a unique infinite path (vo,v±,V2 ■ ■ ■}, where x( v i) = v i+i if v i is in Vm&x 
and n(vi) — v i+1 if Vi is in VMm- This path, referred to as the play induced by 
the two strategies, will be denoted by Play(x, p, Vq). 

For all v in V, we define Val*(v) = min Me /f Min max xe 7j Max 2? (Play (x, p, v)), 
and Val*(w) = max xe /j Max min Me i7 Min 2? (Play (x, p, v)). These will be known as 
the lower and upper values of v, respectively. It is always true that Val*(w) < 
Val*(w). It is well known that for discounted games the two values are equal, a 
property known as determinacy. 

Theorem 1 ([17]). For every discounted game G and every vertex v e V, we 
have Val*(u) = Val*(u). 

The value of the game starting at a vertex v, equal to both Val* (v) and Val* (v) , is 
denoted by Val(w). The computational task associated with discounted games is 



to compute Val(u). Moreover, we want to find optimal strategies, i.e., a strategy % 
that achieves the upper value and a strategy \i that achieves the lower value. 

For convenience, we introduce the concept of a joint strategy a : V — > V that 
specifies moves for both players. The notation a \ Max and a \ Min will be used 
to refer to the individual strategies of Max and Min that constitute the joint 
strategy. For a vertex v, the function a(v) gives the successor of v not chosen 
by a. The functions r a and r a give the reward on the edge chosen by a and the 
reward on the edge not chosen by a, respectively. The path denoted by Play(cr, v) 
is equal to the path Play(cr f Max, a \ Min, v). The joint strategy is optimal if 
both a \ Max and a \ Min are optimal. For a given joint strategy a, the value 
of a vertex v when a is played will be denoted by Val CT (w) = 2?(Play (a, v)). 

Given a joint strategy a and a vertex v, the balance of v is the difference 
between the value of v and the value of the play that starts at v, moves to a(v) 
in the first step, and then follows a, 

Bal * (v)= f™°(v)-(r°(v) + f3-V<ir(<T(v))) iiveVM ^ (i) 
\ (r» + 13 ■ Yal a (a(v))) - Val ff (v) if v e V Mi n- 

A vertex v is said to be switchable under a if Bal CT (w) < 0. If Bal <T (w) = 
for some vertex then that vertex is said to be indifferent. There is a simple 
characterisation of optimality in terms of switchable vertices. 

Theorem 2 ([17]). If no vertex is switchable in a joint strategy a then it is an 
optimal strategy for every choice of starting vertex. 

The two algorithms that we will present use only positional joint strate- 
gies. From now on, all joint strategies that we refer to can be assumed to be 
positional joint strategics. If a play begins at a vertex v and follows a po- 
sitional joint strategy a then the resulting infinite path can be represented 
by a simple path followed by an infinitely repeated cycle. Let Play(cr, v) = 
(vo, v\, . . . , Vk-i, (co, ci, . . . , q_i) w ). It is then easy to see that 

fc-i i-i „ k+i 

Val» = r>.) + YZ^i ■ r °^- 

i=0 i=0 P 

Therefore, the amount that the reward on the outgoing edge of a vertex u con- 
tributes towards the value of v can be defined as follows. 

Definition 3 (Contribution Coefficient). For vertices v and u, and for a 

positional joint strategy a, we define: 



f} 1 if u — Vi for some < i < k, 



k+,. 



P—pt if u — Ci for some <i < I, 
otherwise. 



3 Lemke's Algorithm For Discounted Games 



Lemke's algorithm is a classical algorithm for solving the linear complementarity 
problem [6]. We can apply Lemke's algorithm to a discounted game by utilising 
the reduction of Jurdzihski and Savani [14], however this yields little insight 
into how the algorithm works on a discounted game. In this section we bypass 
the reduction, and give a description of Lemke's algorithm entirely in terms of 
discounted games. 

Lemke's algorithm begins with the joint strategy a Q = p that selects the right 
successor for every vertex in the game. This is actually a free choice since the 
left and right successors can be swapped to obtain an arbitrary starting strategy. 
The algorithm will then move through a sequence of strategies until it arrives at 
the optimal strategy. The algorithm will also construct a modified game for each 
strategy that it considers. The modified games will take the following form. 

Definition 4 (Modified Game For Lemke's Algorithm). For a real num- 
ber z, we define the game G z to be the same as G but with a modified left-edge 
reward function, denoted by r^, and defined, for every vertex v, by: 



For a modified game G z , the function r a z will give the rewards on the edges 
chosen by a. The notations Val^ and Bal^ will give the values the balances of the 
vertices in the game G z , respectively. For every strategy Ui that is considered, 
the algorithm must choose an appropriate value Zj so that cr^ is optimal in G Zi . 
Moreover, we want to choose the minimum value Zi for which this property holds. 
The next proposition shows how to compute this for the initial strategy a - 

Proposition 5. Let z — max{— Bal CT °(u) : v G V}. The strategy a is optimal 
in G ZQ and the vertex vinV that maximizes — Bal* 70 (v) is indifferent. Moreover, 
there is no value y < zq for which ao is optimal in G y . 

Proposition 5 gives an initial value for the parameter z. The principal idea 
behind the algorithm is to drive z down from its initial value to 0, while main- 
taining optimality of the current strategy in G z . Unfortunately, Proposition 5 
implies that we cannot drive z down further without losing the optimality of ao 
in G z . We do however know that there is some vertex v that is indifferent un- 
der do in G Za . We define <j\ = <Jo[a~o(v)/v], i.e., a\{u) = a (u) if u = i>, and 
cti(w) = <jq(u) otherwise. The operation of modifying a strategy by changing the 
successor of a vertex v will be referred to as switching v. 

The value of no vertex changes when switching an indifferent vertex in a 
strategy. Since ao was optimal in G Zo and v was indifferent we therefore have 
that a i is optimal in G ZQ . There is one important difference however, whereas z 
could not be decreased without ao losing its optimality, the parameter z can be 
decreased further whilst maintaining the optimality of a\. The task now is to 
find z\, the minimum value of z for which o\ is still optimal. 




v S V M ax, 
V e V M in- 



(2) 



At a high level, when the algorithm arrives at a strategy Ui its task is to 
find Zi, the minimum value of z for which ui is optimal in G z . As we shall show, 
for this minimum value of z there will always be at least one vertex that is 
indifferent under cr^ played in G z . The algorithm then switches this indifferent 
vertex to create cr i+ i and the process is repeated. The remainder of this section 
is dedicated to showing how Zi can be computed. 

Each step begins with a strategy Oi and the value Zi—±, which was the min- 
imum value of z for which o~i-\ was optimal in G z . We now wish to know how 
much further z can be decreased before <7j ceases to be optimal. From Theorem 2 
we know that a strategy is optimal as long as no vertex is switchable and that 
a vertex is switchable only when it has a negative balance. It is for this reason 
that we want to know how the balance of each vertex changes as z is decreased. 
In order to understand this, we must first know how the value of each vertex 
changes as z is decreased. We will use the notation cL z Val z (v) to denote the 
rate of change of the value of v as z decreases, i.e., —d z Val°(v). This notation 
will be used frequently throughout the rest of the paper to denote the rate of 
change of various expressions. For a proposition p, we define [p] to be equal to 1 
if p is true, and otherwise. We can now give an explicit formula for d- z Val z (v), 
which is based on the left edges that are passed through after visiting the ver- 
tex v while playing the strategy cr, and the contribution coefficient of those edges 
to the value of v. 

Proposition 6. For a vertex v and a joint strategy a, let L be the set of vertices 
for which a picks the left successor, L = {v e V : a{v) — X(v)}. The rate of 
change of the value of v is 

0- z Val» = ^([u G V Max ] - [u e V Mi n]) ■ D». 
ueL 

From equation (1) we know that the balance of a vertex is computed as a 
difference of the values of two vertices. We now show how the rate of change of 
the balance can be derived by substituting the rate of change of the values into 
equation (1). 

Proposition 7. For a vertex v and a joint strategy a we have 

9_ B ^ (v) = f9-^K(v)-(Mv) = Hv)]+P-d- z Y a i:(a(v))) ifveV Max , 
z A \-[*(v) = \(v)]+0-d- I Vai a x (*(v))-d- x Vai a x (v) ifveV Min . 

Now that we have an expression for the rate of change of the balance of a 
vertex, we can compute how far z can be decreased from Zi-i before some vertex 
gets a negative balance. For each vertex v, the expression Bal^*^ (v) /d- z Bal^ (v) 
gives the amount that z can be decreased before v gets a negative balance, and 
so the minimum over all these ratios gives the amount that z can be decreased 
before some vertex gets a negative balance. It should also be clear that a vertex 
that achieves this minimum will be indifferent in the modified game when z is 
decreased by this amount. We can also show that this is the minimum value of z 
for which Oi is optimal in G z . 



Proposition 8. Let a joint strategy <ii be optimal in the modified game G Zi _ 1 , 
and 

Zi = Zi-x -min{- — ' : v e V and d- z Bal^ (v) < 0}. (3) 

0-z £>al z (v) 

Then strategy a is optimal in G Zi , and it is not optimal in G x for all x < z^. 

Until now, we have ignored the possibility of reaching a strategy a in which 
there is more than one indifferent vertex. In LCP algorithms this is known as 
a degenerate step. In this case, the task is to find a strategy in which every 
indifferent vertex v satisfies d- z Bal^ (v) > 0, so that z can be decreased further. 
It is not difficult to prove that such a strategy can be reached by switching 
only the indifferent vertices. One method for degeneracy resolution is Bland's 
rule, which uses the least index method to break ties, and another is to use 
lexicographic perturbations; both methods are well-known, and are also used 
with the simplex method for linear programming [4]. 



Algorithm 1 Lcmkc(G) 

i := 0; cto := p; z := max{- Bal CT ° (v) : v G V} 
while Zi > do 

o~i+i := Oi\(?i(v)/v\ for some vertex v with BalJ*(w) = 

Zi+i := Zi — mini — ^ aX * i — : v € V and d- z Balz* +1 (w) < 0| 

i:=i + l 
end while 



Lemke's algorithm is shown as Algorithm 1 . Since in each step we know that 
there is no value of z < z,- t for which er^ is optimal in G z and we decrease z 
in every step it follows that we can never visit the same strategy twice with- 
out violating the condition that the current strategy should be optimal in the 
modified game. Therefore the algorithm must terminate after at most 2^1 steps, 
which corresponds to the total number of joint strategies. The algorithm can 
only terminate when z has reached 0, and Go is the same game as G. ft follows 
that whatever strategy the algorithm terminates with must be optimal in the 
original game. 

Theorem 9. Algorithm 1 terminates, with a joint strategy a that is optimal 
for G after at most 2' y l iterations. 

Lemke's algorithm for LCPs allows a free choice of covering vector, and in 
our description we used a unit covering vector. This can be generalised by giving 
a positive covering value to every vertex. If each vertex v has a covering value d v 
then the modification of the left edges in Definition 4 becomes: 

r A^ = i rX ( v ) ~ d v -Z DSVlfa, 

\r x (v)+d v -z veV M in- 
The algorithm can then easily be modified to account for this altered definition. 



4 The Cottle-Dantzig Algorithm For Discounted Games 

The principle idea behind the Cottle-Dantzig algorithm is to maintain a set of 
vertices whose balance is non-negative. The algorithm begins with an arbitrary 
strategy, and it goes through a series of major iterations, where in each iteration 
one vertex is brought into the set of vertices with non-negative balances, while 
maintaining the non-negative balances of the vertices that are already in that 
set. It is clear that if such a task can be accomplished, then the algorithm will 
terminate after \V\ major iterations. 

We require a method for bringing some distinguished vertex v into the set 
of vertices with a non-negative balance without the vertices currently in the set 
getting a negative balance in the process. To accomplish this we will modify the 
game by adding a bonus to the edge that the strategy currently chooses at v. 
We will then drive the bonus up from while maintaining an optimal strategy 
for the modified game. Eventually the balance of v will become in the modified 
game, at which point the strategy at v can be switched away from the edge with 
the bonus attached to it, and the bonus can be removed. We will prove that 
after this procedure v will have a positive balance. 

In this section we will override many of the notations that were used to 
describe Lemkc's algorithm. 

Definition 10 (Modified Game For The Cottle-Dantzig Algorithm). 

For a real number w, a joint strategy a, and a distinguished vertex v, we define 
the game G w to be the same as G but with a different reward on the edge chosen 
by a at v. If a chooses the left successor at v then the left reward function is 
defined, for every u in V , by: 



If a chooses the right successor at v then r p modified in a similar manner. 

We begin the major iteration with a strategy do, a value wq — 0, and a set 
of vertices with non-negative balances P. The task is to raise w from until 
Bal^,(f) = 0, while maintaining the invariant that every vertex in P has a non- 
negative balance. This can be accomplished using methods that are similar to 
those used in Lemke's algorithm. For every vertex in P we must compute how 
the balance of that vertex changes as w is increased. The following propositions 
are analogues of Propositions 6, 7, and 8. 

Proposition 11. Consider a vertex u and a joint strategy a. Suppose that v is 
the distinguished vertex. The rate of change d^, Val^(u) is D"(u). 

Proposition 12. Consider a vertex u and a joint strategy a in the game G w . 
The rate of change d w Bal^(w) is: 




d w Val^O) - (3-d w Val^(er(») if u e V Max , 
13 ■ d w V<(<f(u)) - d w V<(«) if u e V Mm . 



Algorithm 2 Cottle-Dantzig(G, a) 
P ■- 

while P V do 

i := 0; wo := 0; w := Some vertex in V \ P 
while BaC. (t>) < do 

Mi+i := + min{— g ^ B ™'j ( ^ : w G P U {u} and d w Bal^(w) < 0} 
a := a[a(u)/u] for some vertex u with BalJ,. («) = 
i:=i + l 
end while 

<7 := a[a(v)/v]; P :=PU{»} 
end while 



Proposition 13. Consider a modified game G w , a joint strategy a, and a set 
of vertices P which must not have negative balances. Let 

Bal CT (u) 

y = w + min{- w ^ a : «ePU {v} and d w Bal*(u) < 0}. 

TVo vertex in P has a negative balance in G y . Moreover, one vertex in P U {v} 
is indifferent, and for all values x > y that vertex has a negative balance in G x . 

The process of raising w up from until the balance of v is in the modified 
game is the same as the process of decreasing z in Lemke's algorithm, only using 
the different definitions from Propositions 11, 12, and 13. Once the balance of v 
has reached we can stop increasing w. Since v is now indifferent we can switch 
it away from the edge that has the bonus attached to it. Once this has been 
done, the values of all vertices are no longer affected by w, since the edge to 
which it is attached is no longer chosen by the current strategy. Therefore we 
can remove the bonus and recover the original game. The major iteration then 
terminates with a strategy in which every vertex in P U {v} has a non-negative 
balance, and the next major iteration can begin. 

Theorem 14. Algorithm 2 terminates, with the optimal joint strategy, after at 
most iterations. 



5 Exponential Lower Bounds 

We show that both Lemke's and the Cottle-Dantzig algorithms take exponen- 
tially many steps on the family of games shown in Figure 1. Max vertices are 
depicted as squares and Min vertices are depicted as circles. For every vertex, 
we define the right successor to be the vertex with the same owner as the vertex 
itself, and the left successor to be the vertex that belongs to the other player. 
Recall that the initial strategy for Lemke's algorithm is the one that chooses the 
right successor for every vertex. When speaking about vertices in the game we 
often refer to cither the leftmost or the rightmost vertex with a certain property. 



In this context, the vertex being referred to is the one that is furthest to the 
right or to the left in Figure 1. 

For ease of exposition, we will describe the steps of the algorithm as if the 
discount factor was 1. Although this is forbidden by the definition of a discounted 
game, since the game contains one cycle, whose value is zero, the value of every 
vertex under every strategy will be finite. As long as the discount factor is chosen 
sufficiently close to 1, the algorithm will behave as we describe. 




Fig. 1. The game Q n . 

Note that the game graph is symmetric with respect to the line that separates 
the vertices of the two players. We frequently refer to a vertex and the vertex 
that it is opposite to, and hence we introduce the concept of vertex reflections. 
For a vertex v that is not the sink, we write v to denote the reflection of v, 
that is the vertex belonging to the other player that is shown directly opposite v 
in Figure 1. We say that a joint strategy a is symmetric if for all vertices v, 
the strategy a chooses the right successor of v if and only if it chooses the right 
successor of v. The initial strategy for Lemke's algorithm is a symmetric strategy. 
Lemke's algorithm always switches v directly before or after v and so it can be 
seen as traversing through symmetric strategies. 

Before discussing the modified games that Lemke's algorithm constructs, we 
give a simple characterisation of when a vertex is switchablc in the original game. 

Proposition 15. If a is a symmetric joint strategy, then a vertex v is switchable 
if and only if the path from v has an even number of left edges. 

We now use this characterisation to give a simple formula for cL z Bai^(u) for 
every vertex v under every symmetric joint strategy a. 

Proposition 16. If a is a symmetric joint strategy, then <9_ z Bal^(v), the rate 
of change of the balance of a vertex v, is 1 if v is switchable, and —1 otherwise. 

Together, Propositions 15 and 16 imply that the parameter z can be set to 
the largest balance of a switchable vertex. We show that the largest balance will 
always belong to the rightmost switchable vertex. 

Proposition 17. Let a be a symmetric joint strategy, v be the rightmost switch- 
able Max vertex, and z — — Bal CT (v). Then no vertex in G z is switchable, both v 



and the reflection of v are indifferent, and for every real number y < z, there is 
a switchable vertex in G y . 

Proposition 17 implies that whenever Lemke's algorithm is considering a sym- 
metric joint strategy, it must choose a z so that the rightmost switchable vertex 
is indifferent. We show that this leads to an exponential number of switches. 

Theorem 18. Lemke's algorithm performs 2" +1 — 2 iterations on the game Q n . 

The Cottle-Dantzig algorithm is sensitive to the order in which to bring 
the vertices into the non-negative set. We prove that there is an order that 
causes exponential-time behaviour for this algorithm. The sequence of strategies 
is similar to the sequence that Lemke's algorithm follows. 

Theorem 19. Consider an order in which all Min vertices precede Max vertices, 
and Max vertices are ordered from right to left. The Cottle-Dantzig algorithm 
performs 2 n+1 — 1 iterations. 

We have shown that both algorithms can take an exponential number of 
steps on the discounted game Q n . We argue that this also implies an exponential 
lower bound for parity and average-reward games. From the game Q n we can 
obtain a parity game by replacing the reward ±2 C with the priority c. The 
standard reductions [16, 21, 13] convert this parity game into average-reward 
and discounted games where priority c is replaced with reward (— n) c , and the 
discount factor is chosen to be very close to 1. All arguments used to prove 
Theorems 18 and 19 continue to hold if rewards of magnitude 2 C are replaced 
with rewards of magnitude n c , which implies the exponential lower bounds also 
hold for parity games and average-reward games. 

6 Future Work 

Our adaptation of Lemke's algorithm for solving discounted games corresponds 
to its implementation in which the unit covering vector is used [6], and our 
lower bounds are specific to this choice. Similarly our lower bounds for the 
Cottle-Dantzig algorithm require a specific choice of ordering over the vertices. 
Randomizing these choices may exhibit better performance and should be con- 
sidered. 

Adler and Megiddo [1] studied the performance of Lemke's algorithm for 
the LCPs arising from linear programming problems. They showed that, for 
randomly chosen linear programs and a carefully selected covering vector, the 
expected number of pivots performed by the algorithm is quadratic. A similar 
analysis for randomly chosen discounted games should be considered. 
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A Proofs for Section 3 



A.l Proof of Proposition 5 

Proof. If there is no vertex v that satisfies Bal CT °(i;) < then the current strategy 
is optimal by Theorem 2 and there is no need to modify the game. Otherwise, 
the definition of the modified game states that for every Max vertex, the weight 
on the edge not chosen by a is decreased by z . Similarly for every Min vertex, 
the weight on the edge not chosen by Co is increased by zq. Since none of the 
modified weights are on edges chosen by <7o every vertex has the same value 
in G Zg as it does in G when a is played, and only the balance of the vertices will 
change. These three observations imply that the balance of all vertices, as given 
by equation (1), is increased by zq. Since zq was chosen to be equal in magnitude 
to the largest negative balance, the balance of every vertex when a is played 
in G Zo must be non-negative. Therefore cto is optimal in G Zo by Theorem 2. It is 
also easy to see that the vertex with the largest negative balance is indifferent 
and that this vertex would have a negative balance for all games modified by a 
value smaller than z n . □ 

A. 2 Proof of Proposition 6 

Proof. From equation (2) we find that for a vertex v, 

!r a z (v) + c if v e V Max and a(v) = X(v), 
r%{v) -c if v e V M in and a(v) = \(v), 
r z (v) otherwise. 

Therefore, the change in Val CT (w) is equal to: 

V<_» - Val» 

= E & ■ <-c(«i) + E 7— fi ■ r *-c(ci) - Val » 

i=0 i=0 P 

= E([^ G ^MaJ • D£(a;) - [x e V Mi n] ■ D^s)) ■ c + V<(«) - V<(«) 

= E([^ G ^MaJ • D» - [or e 14ii„] • D^(x)) • c . 

xeL 

a 



A. 3 Proof of Proposition 7 

Proof. The formula is obtained by substituting <9_ z Val^ for Val' 7 in Definition 1. 
Additional care must be taken if a chooses the right successor of v. In this 
case, the left edge of v will also be decreased as z is decreased, and this is not 
captured by cL z Val z (<r(i>)). This can be corrected for, however, by substituting 
[a(v) — X(v)] for r a (v) in Definition 1. □ 



A. 4 Proof of Proposition 8 



Proof. All vertices with cL z Bal^ 4 > can be ignored since their balances will 
not decrease as z is decreased from Zi-\. For every vertex v with d- z BalJ^v) < 
0, the ratio (Bal z i (v))/(cL z Bal z i (i>)) gives the largest amount that z can be 
decreased by before Bal z *(w) becomes equal to zero. By choosing Zi to be the 
minimum over these ratios we ensure that the balance of all vertices remains 
non-negative while the vertex with the minimum ratio becomes indifferent. For 
all values x < Zj the vertex with the minimum ratio will have a negative balance, 
which implies cr^ will not be optimal in G x by Theorem 2. 

A. 5 Proof of Theorem 9 

Proof. In each step we have a game G Zi in which the current strategy <jj is op- 
timal. We also know that there is a unique vertex v that is indifferent, and that 
d- z Bal z *(w) < 0, which implies that z cannot be decreased further without the 
balance of v becoming negative. The vertex v is then switched in <7j, giving <7j+i. 
Note that by Proposition 7, the balance of v is computed as the difference be- 
tween the successor chosen by the current strategy and the alternate successor. 
Switching v causes this difference to be reversed and therefore, we have: 

a_ z Bal^(«) = -9-,Bal^+ 1 (v). 

It follows that as z is decreased, Bal CTi+1 (t>) will increase. Since v was the unique 
indifferent vertex, z can be decreased by a strictly positive amount while main- 
taining optimality of o- i+1 . 

Proposition 8 implies that Oi is not optimal in G x for all x < zi. Since 
Zi + j < Zi for all j > it follows that Oi will never again be an optimal strategy 
for the modified game. The algorithm always maintains optimality of the current 
strategy in the modified game, and so it can never revisit <7j. There are only 2^ v \ 
possible strategies and therefore, the algorithm must terminate after at most 2^1 
iterations. The algorithm must terminate at the optimal strategy since optimality 
of the current strategy in the modified game is maintained in each iteration and 
it cannot terminate until z reaches zero, and G is the same as the original 
game. □ 

B Proofs for Section 4 

B. l Proof of Proposition 11 

Proof. The proof is very similar to the proof of Proposition 6. Since D"(t>) gives 
the precise contribution of the weight on the edge of v to the value of u and the 
outgoing edge of v is the only edge that is modified in G w it is clear that the 
value of u will increase in proportion to D"(u). □ 



B.2 Proof of Proposition 12 

Proof. The proof is very similar to the proof of Proposition 7. We obtain the 
expression by substituting d w Val^(v) for Val cr (w) in Definition 1. In contrast to 
the expression used for Lemke's algorithm, we do not need to account for the 
edge not chosen by a at u. This is because only one edge is modified in G w , and 
it is chosen by all strategies during the current major iteration. □ 

B.3 Proof of Theorem 14 

Proof. As with Lemke's algorithm, we assume for now that there are no de- 
generate steps during the execution of the algorithm, as these can be resolved 
using the same rules that were outlined for Lemke's algorithm. Using an identi- 
cal argument as the one given in Theorem 9, it can be shown that during each 
major iteration the algorithm can pass through at most 2' p ' different strategies. 
Since the set P is equal to V after the final major iteration, and all vertices 
in P may never have a negative balance, all vertices will not be switchable in 
the final strategy, and it must therefore be optimal by Theorem 2. To see that 
the algorithm terminates after considering at most 2^1 strategies, note that 
each major iteration passes through 2> p > strategies. The maximum number of 
strategies visited must therefore be: 



C Proofs for Section 5 
C.l Proof of Proposition 15 

Proof. If the vertex v belongs to player Max and the path from v uses an even 
number of left edges then the final edge used before reaching the loop will have 
weight —2™. By symmetry, the path starting at the alternate successor of v will 
pass through the weight 2™. If we ignored the other weights on the two paths 
then the balance of v would be 2 x — 2" and v would be switchable. It can 
easily be verified that no matter which other edges the path passes through, 
the sum of the weights of those edges can never reach 2x2™. This means that 
the balance of v cannot be brought above and the vertex will therefore be 
switchable. Conversely, if the path uses an odd number of left edges then the 
difference of the final edge used will be 2 x 2" and there is no path that has 
enough weight to bring the balance of the vertex below zero, implying that the 
vertex is not switchable. The arguments work symmetrically for vertices owned 
by player Min. □ 



|V|-1 




i=0 



□ 



C.2 Proof of Proposition 16 



Proof. We prove the proposition for the case when v is switchable. The proof for 
the case where v is not switchable very similar. Since v is switchable, the path 
from v that follows a contains an even number of left edges. Since the owner of 
vertex through which the path passes changes only when a left edge is taken it 
follows that precisely half of the left edges originate from vertices belonging to 
either player. If X is the set of vertices for which the path from v uses a left 
edge, then by Proposition 6 the rate of change of v is: 



Val» = J2 d x e ^Max] • D"(a;) - [x e V Mi n] ■ D"(aO) 

= J2([xe Vwax] - [X € V M in]) = 0. 



x£X 

If a chooses the left successor of v then the path starting from o~(y) that follows a 
contains an odd number of left edges, and by symmetry the path that follows a 
from the alternate successor of v has an odd number of left edges. Alternatively, 
if er chooses the right successor of v then the path that follows a from a{y) 
contains an even number of left edges and so does the path that follows a from 
the alternate successor of v. Either way, the path starting at v that moves to a(v) 
and then follows a contains an odd number of left edges. Moreover the final left 
edge on that path emanates from a vertex owned by the same player that owns v. 
Therefore, by Proposition 6 



[<T(v) = \(v)}+f3-d_ z V<d°(a(v) 



1 if v e Vmox, 

-1 if V € VMin. 



The proposition can now be proved by substituting the values into Proposition 7. 

□ 



C.3 Proof of Proposition 17 

Proof. Let {vo,v\, . . . ,Vk) be the set of switchable vertices belonging to player 
Max ordered left to right. By Proposition 15 we know that there are an even 
number of left edges on the path that follows a from each of these vertices. 
From a vertex Vi, the path either takes a right edge to Vi-\ or takes a left edge 
to a Min vertex, followed by a possibly empty sequence of right edges passing 
through Min vertices followed by a second left edge back to a Max vertex u. 
Since the path from Vi passes through an even number of left edges the path 
from all of the odd vertices passed through must have an odd number of left 
edges indicating that they are all unprofitable and by symmetry all of the Max 
vertices between and u are also unprofitable. It follows that u is equal to Vi-i 
and that the path from Vi passes through every vertex Vj with j < i. 

Since both of the outgoing edges of Vi have the same weight the balance at Vi 

is: 

BaT(vi) = VarVOi)) - Var(e?(^)). 



Since the path leaving Vi+i that follows a passes through Vi, by symmetry the 
path leaving the alternate edge of Vi + i must pass through the reflection of Uj+i 
and then a{vj). Let c and d denote the sum of the weights along the path 
from Vi+i to Vi and from the alternate edge of i>i+i to the reflection of m. The 
balance at v i+ i is then: 

Bal> i+1 ) = (Val'^Wi)) + r„(v t ) + c) - (Val" '(ofa)) + r a (vj) + d) 
= Bsl a {vi) + {r a {vi) + c) - (r CT (w) + d). 

Note that both r CT (t>j) and — r a (vl) are both negative and that c — d cannot 
possibly cancel them out. Therefore: 

Bal CT K) > Bal> i+1 ). 

Now, if z is equal to — Bal <T (u fc ), since all other switchable vertices have 
balances with smaller magnitude none of them will be switchable in G z and Vk 
will be indifferent. For any real parameter y less than z the vertex Vk would be 
switchable in G v . □ 



C.4 Proof of Theorem 18 

Proof. The claim will be proved by induction. In G\ there are only two vertices 
with more than one successor and they are reflections of each other. Initially 
they are both profitable, since their paths use zero left edges and in the first and 
second iterations they switch, arriving at the optimal strategy in two iterations. 

The game can be broken down into the leftmost pair of vertices and a 
game which encompasses the vertices to the right of this pair. Since Lemke's 
algorithm works from the right an optimal strategy for the game Gj_i must be 
computed before the algorithm will touch the leftmost pair. Once the algorithm 
has arrived at this strategy the leftmost pair will both be switched, one after 
the other. Suppose that while solving the sub-game the algorithm passed 

through the sequence of strategies (ctq, ai, . . . , <7fc). Let o~[ be the strategy <7j with 
the left pair switched. Note that due to Proposition 15 a vertex pair is switchable 
in crj if and only if it is not switchable in a' { . Suppose that v and v are the 
rightmost switchable pair of vertices in a^. After switching the pair, the algorithm 
moves to a strategy r in which v and v are not switchable and everything to 
the right of them is switchable. If the leftmost pair were not switched then the 
situation would be precisely reversed with v and v being the rightmost switchable 
pair. It follows that r = cr' i _ 1 and therefore after switching the leftmost pair the 
algorithm will then proceed to pass through all of the strategies that it has 
visited up to that point in reverse order. From this we obtain the recursion: 

T(l) = 2, 

T(n) = T(n - 1) + 2 + T(n - 1). 



It can easily be verified that T(n) = 2 n+1 - 2. 



□ 



D Exponential Lower Bound For The Cottle-Dantzig 
Algorithm (Proof of Theorem 19) 

We show that the example that was used to show an exponential lower bound for 
Lemke's algorithm can also be used to show an exponential lower bound for the 
Cottle-Dantzig algorithm. The initial strategy will be the one that selects the 
right successor for every Max vertex and the left successor for every Min vertex. 
Note that for this strategy every Min vertex has a positive balance and every 
Max vertex has a negative balance. Recall that the Cottle-Dantzig algorithm 
allows a free choice for the order in which vertices are brought into the non- 
negative set. We use the following order: all of the Min vertices will be brought 
in first, followed by the Max vertices, which will be brought in from right to 
left. Since the balance of all Min vertices is positive, bringing them into the 
non-negative set is a trivial operation that does not require modification of the 
initial strategy. The major iterations that follow will be numbered 1 to n, and 
it is our goal to show that the Cottle-Dantzig algorithm will take 2 l — 1 steps in 
major iteration i. 




Symmetric portion 



Fig. 2. The state of the algorithm at the start of major iteration i. 

The situation at the start of major iteration i is depicted in Figure 2. The 
edges chosen by the current strategy are depicted as solid edges, while dotted 
edges show the alternate edges. The indices on the Max vertices show in which 
iteration they are designated to enter the non-negative set. For convenience, we 
use these indices to identify both the vertices themselves and the pair consisting 
of the vertex with index i and its reflection. It can easily be seen that in major 
iteration i the strategy may only change at vertices that are to the right of the 
vertex pair with index i, since no other vertices can reach the Max vertex with 
index i. Unlike Lemke's algorithm, the overall strategy will not be symmetric 
until the optimal strategy is found, but after major iteration i the strategy will 
be symmetric on the i rightmost vertex pairs. This is shown as the symmetric 
portion in Figure 2. 

We begin by giving a characterisation of d Bal^ for the vertices in the sym- 
metric portion. 



Proposition 20. Suppose that the algorithm is in the major iteration i at a 
strategy a, and that v is a vertex in the symmetric portion of a. Let tt be the 
path from v to the vertex with index i or its reflection, according to a. Then we 
have: 

I — 1 If tt contains an odd number of left edges, 
d z Bal°(v) = < 

I 1 otherwise. 

Proof. If v is a Max vertex and the path from v to vertex pair with index i 
takes an odd number of left edges then the path will arrive at the reflection of 
Max vertex with index i. It follows that d z Val^(u) is equal to 0, since the path 
does not pass through the Max vertex with index i. Now consider the alternate 
successor of v, which we call u = cf(v). Since the strategy is symmetric until 
it reaches the vertex pair with index i, the path from u must lead to the Max 
vertex with index i and so d z Val z (u) is equal to 1 . Substituting these into the 
definition of d z Bal z (v) gives: 

d z Ba£(«) = d z Val*(v) - • d t Val*(u) = - 1 = -1. 

If the path from v uses an even number of left edges then the above is reversed: 
the path from v passes through the Max vertex with index i and the path from 
u does not. With the same reasoning we can conclude that d z Bal£(i>) = 1. The 
proof for vertices belonging to player Min is entirely symmetrical. □ 

Proposition 20 implies that for the initial strategy o in major iteration i, 
every vertex v to the right of the vertex pair with index i has d z Bal^ (v) equal 
to — 1. It follows that as w is increased, the first vertex to switch will be the 
one with the smallest balance. For the initial strategy this will be the rightmost 
vertex pair. 

Proposition 21. Consider the first strategy in major iteration i. For every ver- 
tex v with index j which is smaller than i, the balance of v will be 2 x (2 1 — 

2^ik=j + l Z I 

Proof. It can be seen in Figure 2 that Max vertex with index i — 1 has balance: 

2 l - 2 l ~ x - (-2' - 2*- 1 ) = 2x2* 

The Min vertex with index i — 1 has the same balance. Now, for every vertex to 
the right of the Max vertex v with index i — 1, the balance can be computed by 
considering two paths: the path that follows a from v to the sink and the path 
that uses the alternate edge of v and then follows a to the sink. These two paths 
join at the Max vertex with index i + 1, and so the weights after this point are 
irrelevant. The two weights on the outgoing edges are also irrelevant since they 
are identical and one will be subtracted from the other. If v is the Max vertex 
with index j then its balance is: 

2 l - 2 i_1 2 j - (-2* + 2 J ~ 1 H h 2 j+1 - 2 j ) 

i-1 

=2 x (2 4 - 2fe ) 
k=j+l 



A symmetric proof can be used to show that the Min vertices have identical 
balances to their reflections. □ 



Our goal is to show that major iteration i will take 2* — 2 steps. For this 
purpose we define, for 1 < j < i, the quantity kj to be equal to the number of 
strategies that the Cottle-Dantzig algorithm passes through before the vertex 
with index j has balance for the first time. Furthermore, we define Wj to be 
the value of the parameter w the first time the vertex with index j has balance 
zero. Note that Proposition 20 implies that the first value of w chosen by the 
algorithm will be the minimum over the balances of the vertices in the symmetric 
portion. Proposition 21 implies that this will be the rightmost pair of vertices, 
whose indices are 1. Therefore we set k\ = and w\ — 2 x (2* — Y^k=2 2*0- The 
rightmost pair of vertices will be indifferent in the game G Wl . We use these as 
the base case for the following inductive proposition. 

Proposition 22. Suppose that the algorithm is in major iteration i and that 
is has passed through kj strategies, arriving at the first strategy in which the 
vertices with index j have balance 0. All of the following hold: 

— The algorithm will pass through kj + 2 further strategies before the vertex 
with index j + 1 has balance for the first time. That is kj + \ = 2kj + 2. 

— The value of Wj+i, which is equal to the value of the parameter w when 
the vertex with index j + 1 has balance for the first time, will be Wj+i = 
2x(2*-£^ +2 2fc )- 

— No vertex with index higher than j + 1 will be switched before the vertex with 
index j + 1 has balance 0. 

Proof. Let (cto, cri, ■ ■ ■ crt) be the sequence of strategies that the algorithm con- 
sidered before the vertex with index j had balance for the first time. For a 
strategy ct/ in this sequence we define the strategy a[ as follows, for every ver- 
tex v: 

i ^ -j I &i( v ) ^ th° index of v is j, 
}cri(v) otherwise. 

In other words, o~[ is <t; in which the vertices with index j have been switched. 
Since both vertices with index j are indifferent in a^. the algorithm will spend 
two iterations switching them, one after the other. Note that the algorithm has 
now arrived at a' k . . Proposition 20 implies that for every / in the range 1 < I < kj 
and every vertex v with index smaller than j, 

<9 z Bal^(w) = -d z Bal a Xv). 

This is because after switching the vertex pair with index j every vertex with 
index less than j sees an extra left edge on its path to the cycle. So, as w is 
increased the balance of every vertex a[ will move in a direction that is opposite 
to the way that it moved in 07. Under the assumption that no vertex with 
index higher than j becomes indifferent, it can easily be verified that if w is 



raised by Wj — w\ the Cottle-Dantzig algorithm will pass through the sequence 
of strategies (a' k .,a' k ._ 1 , . . . a' Q ) which together with the two iterations spent 
switching the vertices with index j implies that the algorithm will pass through 
2kj + 2 further strategies. 

We now prove the assumption that the algorithm can raise w to Wj + (wj— Wi) 
while not touching the vertices with indices higher than j. For a vertex v with 
index m, which is higher than j, Proposition 21 implies that the balance of v at 
the start of major iteration i was: 

i-1 

2x(2 ! - 2fc )- 

k— m+1 

Furthermore, the inductive hypothesis guarantees that the strategy has not been 
changed from the initial strategy on all vertices with index higher than j. Since 
in the initial strategy <9 Z Bal^° (v) = —1 it follows that <9 z BalJ ! (i>) = —1 and 
d z B&l° l (v) = —1 for all I in the range 1 < I < kj. Hence, the balance of v has 
been decreasing continuously since the start of major iteration i. Therefore, in 
iteration kj, when the vertex with index j had balance for the first time, the 
balance of v was: 

i-i 

2 x (T - 2fc ) 

k—m+1 
i-1 

= 2x(2 s - 2fc ) 

k—m-\-l 

m 

= 2 x ^ 2 k . 

k=j+i 

From this we can conclude that the balance of every vertex with index higher 
than j is at least 2 x 2 3+1 . We intend to raise w by an amount equal to Wj — w\, 
this is equal to: 

i-i i-i 
2x(2 ! - 2 t )-(2x(2 i -^2 fc )) 

k=j + l k=2 

3 

= 2x^2 3 <2x (2 j+1 - 1) 

fc=2 

Since this value is smaller than 2 x 2 J+1 no vertex with index higher than j will 
become negative while w is being raised. 

So far we have proved that the algorithm will pass through kj + 2 fur- 
ther strategies without changing the strategy at the vertices with indices higher 
than j. We now prove that in iteration 2kj + 2 the vertex pair with index j + 1 
will be indifferent. Note that in a' the path from every vertex with index less 



- Wj 

i-1 

- (2 x (2* - Y 2fe )) 

k=j+l 



than or equal to j uses precisely two left edges: one which belongs to a vertex 
with index j and one that belongs to a vertex with index i — From this we can 
conclude, by Proposition 20, that c^Bal^f) = 1 for every vertex v with index 
smaller than or equal to j. This implies that once the algorithm has reached (7 
it can continue to raise w until the balance of some vertex with index greater 
than j becomes 0. We have already shown that the balance of every vertex with 
index higher than j has decreased in every iteration before the algorithm arrived 
at (Tq. Therefore, to find the first vertex whose balance becomes as w is in- 
creased we need only to find the vertex whose balance was the smallest, among 
those vertices with indices higher than j, at the start of major iteration i. From 
Proposition 21 this is clearly the vertex with index j + 1. 

We have now fully proved the first and third parts of the proposition. To 
finish the proof we need only to compute the value that w must be set to in 
order to make the vertices with index j + 1 have a balance equal to 0. Since the 
balance of these vertices has always decreased, this must be equal to the balance 
that these vertices had at the start of major iteration i. By Proposition 21 this 
is equal to: 

i-l 

2 x (2 l - 2 *) 

k=j+2 

a 

Finally, we can provide a proof for Theorem 19, by showing that the Cottle- 
Dantzig algorithm will take an exponential number of steps on this family of 
games. 

Proof (of Thoerem 19). The algorithm first spends n steps adding the Min ver- 
tices to the non-negative set. Then, in major iteration i, Proposition 22 leads 
to the same recursion that appears in the proof of Theorem 18. Thus, while 
increasing w the algorithm must pass through T(i — 1) = 2 l — 2 strategies. The 
algorithm will then switch the Max vertex with index i, which means that it will 
traverse 2* — 1 strategies in total during major iteration i. This gives the total 
number of iterations used by the algorithm as 

n 

n + ^(2* - 1) = 2" +1 - 1. 
»=i 



□ 



